Search CORE

2,093 research outputs found

TensorLayer: A Versatile Library for Efficient Deep Learning Development

Author: Dong Hao
Guo Yike
Liu Fangde
Mai Luo
Oehmichen Axel
Supratak Akara
Yu Simiao
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/08/2017
Field of study

Deep learning has enabled major advances in the fields of computer vision, natural language processing, and multimedia among many others. Developing a deep learning system is arduous and complex, as it involves constructing neural network architectures, managing training/trained models, tuning optimization process, preprocessing and organizing data, etc. TensorLayer is a versatile Python library that aims at helping researchers and engineers efficiently develop deep learning systems. It offers rich abstractions for neural networks, model and data management, and parallel workflow mechanism. While boosting efficiency, TensorLayer maintains both performance and scalability. TensorLayer was released in September 2016 on GitHub, and has helped people from academia and industry develop real-world applications of deep learning.Comment: ACM Multimedia 201

arXiv.org e-Print Archive

Crossref

Spotnik: Designing Distributed Machine Learning for Transient Cloud Resources

Author: Guo Li
Mai Luo
Pietzuch Peter
Wagenländer Marcel
Publication venue
Publication date: 14/07/2020
Field of study

Edinburgh Research Explorer

Move Fast and Meet Deadlines: Fine-grained Real-time Stream Processing with Cameo

Author: Gupta Indranil
Mai Luo
Potharaju Rahul
Venkataraman Shivaram
Xu Le
Publication venue
Publication date: 06/10/2020
Field of study

Resource provisioning in multi-tenant stream processing systems faces the dual challenges of keeping resource utilization high (without over-provisioning), and ensuring performance isolation. In our common production use cases, where streaming workloads have to meet latency targets and avoid breaching service-level agreements, existing solutions are incapable of handling the wide variability of user needs. Our framework called Cameo uses fine-grained stream processing (inspired by actor computation models), and is able to provide high resource utilization while meeting latency targets. Cameo dynamically calculates and propagates priorities of events based on user latency targets and query semantics. Experiments on Microsoft Azure show that compared to state-of-the-art, the Cameo framework: i) reduces query latency by 2.7X in single tenant settings, ii) reduces query latency by 4.6X in multi-tenant scenarios, and iii) weathers transient spikes of workload

arXiv.org e-Print Archive

Edinburgh Research Explorer

OpenPARF: An Open-Source Placement and Routing Framework for Large-Scale Heterogeneous FPGAs with Deep Learning Toolkit

Author: Di Zhixiong
Liang Yun
Lin Yibo
Luo Guojie
Mai Jing
Wang Jiarui
Publication venue
Publication date: 28/06/2023
Field of study

This paper proposes OpenPARF, an open-source placement and routing framework for large-scale FPGA designs. OpenPARF is implemented with the deep learning toolkit PyTorch and supports massive parallelization on GPU. The framework proposes a novel asymmetric multi-electrostatic field system to solve FPGA placement. It considers fine-grained routing resources inside configurable logic blocks (CLBs) for FPGA routing and supports large-scale irregular routing resource graphs. Experimental results on ISPD 2016 and ISPD 2017 FPGA contest benchmarks and industrial benchmarks demonstrate that OpenPARF can achieve 0.4-12.7% improvement in routed wirelength and more than

2\times

speedup in placement. We believe that OpenPARF can pave the road for developing FPGA physical design engines and stimulate further research on related topics

arXiv.org e-Print Archive

KungFu: Making Training in Distributed Machine Learning Adaptive

Author: Brabete Andrei-Octavian
Fertakis Konstantinos
Li Guo
Mai Luo
Pietzuch Peter
Wagenländer Marcel
Publication venue
Publication date: 31/08/2020
Field of study

When using distributed machine learning (ML) systems to train models on a cluster of worker machines, users must con-figure a large number of parameters: hyper-parameters (e.g. the batch size and the learning rate) affect model convergence; system parameters (e.g. the number of workers and their communication topology) impact training performance. In current systems, adapting such parameters during training is ill-supported. Users must set system parameters at deployment time, and provide fixed adaptation schedules for hyper-parameters in the training program. We describe Kung Fu, a distributed ML library for Tensor-Flow that is designed to enable adaptive training. Kung Fu allows users to express high-level Adaptation Policies(APs)that describe how to change hyper- and system parameters during training. APs take real-time monitored metrics (e.g. signal-to-noise ratios and noise scale) as input and trigger control actions (e.g. cluster rescaling or synchronisation strategy updates). For execution, APs are translated into monitoring and control operators, which are embedded in the data flowgraph. APs exploit an efficient asynchronous collective communication layer, which ensures concurrency and consistency of monitoring and adaptation operation

Edinburgh Research Explorer

Spiral - Imperial College Digital Repository